Focal adhesion kinase-YAP signaling axis drives drug-tolerant persister cells and residual disease in lung cancer

Targeted therapy is effective in many tumor types including lung cancer, the leading cause of cancer mortality. Paradigm defining examples are targeted therapies directed against non-small cell lung cancer (NSCLC) subtypes with oncogenic alterations in EGFR, ALK and KRAS. The success of targeted therapy is limited by drug-tolerant persister cells (DTPs) which withstand and adapt to treatment and comprise the residual disease state that is typical during treatment with clinical targeted therapies. Here, we integrate studies in patient-derived and immunocompetent lung cancer models and clinical specimens obtained from patients on targeted therapy to uncover a focal adhesion kinase (FAK)-YAP signaling axis that promotes residual disease during oncogenic EGFR-, ALK-, and KRAS-targeted therapies. FAK-YAP signaling inhibition combined with the primary targeted therapy suppressed residual drug-tolerant cells and enhanced tumor responses. This study unveils a FAK-YAP signaling module that promotes residual disease in lung cancer and mechanism-based therapeutic strategies to improve tumor response.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g.means) or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
u For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings u For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes u Estimates of effect sizes (e.g.Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.(v1.18.1), ggplot2 (v.3.3.3),Seurat (v.3.2.2), DESeq, GSEA4.0.1, ANNOVAR, Flow Jo and Kaluza, Aperio Image Scope RNAseq analysis: RNA was extracted from snap-frozen tissue or cell pellets.For tissue samples, tissue was minced using a liquid nitrogen-cooled mortar and pestle before RNA extraction.RNA isolation was performed using the RNeasy Mini kit (Qiagen) including an on-column DNase I digestion.RNA quality was assessed by automated electrophoresis using the RNA 6000 Pico Kit and an Agilent 2100 BioAnalyzer (Agilent Technologies, Inc.).RNA was quantified using the Qubit RNA HS Assay Kit and a Qubit 2.0 fluorometer (Thermo Fisher Scientific).Library preparation and pairedend 150bp (PE150, Illumina) RNA sequencing was performed by Novogene (Novogene Corporation, Sacramento, USA).RNA-Seq reads were mapped to the hg19 reference genome using STAR (Spliced Transcripts Align to a Reference, v2.4.2a).The expression level of transcript per million (TPM) reads were quantified using RNA-Seq by Expectation-Maximization algorithm (RSEM v1.2.29).The quantified gene expressions of 26,334 transcripts (including coding genes and non-coding genes) were processed in R studio.Differentially expressed genes between tumor and normal samples were identified using the EdgeR algorithm.Gene set enrichment analysis was done using GSEA 4.0.1 software scRNA sequencing trajectory: A BFP-tagged barcode library (Addgene #85968) was delivered via lentiviral infection into isogenic EGFR-mutant PC9-C2 and H1975-B10 cells.Cells were sorted and serially titrated to allow for -1000 unique barcode groups.After expansion, cells were subjected to 0.1% DMSO or 2µM osimertinib treatment and frozen down at the indicated timepoints.Cells were thawed, hashed with TotalSeq A anti-human hashtag antibodies (BioLegend), and pooled for single-cell RNA sequencing on the 10X chromium v3 platform (10x Genomics).Cell hash libraries were prepared as specified by BioLegend.Custom barcode amplification was performed by two rounds of PCR.Libraries were sequenced on the NovaSeq Illumina platform (Center for Advanced Technology, UCSF).After NGS sequencing, cells were called with 10X Cell Ranger pipeline and cell hashes were called using the scEasyMode package in Python.In addition, bulk genomic barcodes were prepared from the same time points used for single-cell RNA sequencing using the Quick Extract gDNA extraction protocol (Lucigen Corporation) and custom barcode amplification primers for NGS library preparation.A custom script for calling genomic barcodes mapping between single-cell genomic barcodes and bulk genomic barcodes collected from the same samples was used to assess population frequency and map onto single-cell transcriptomes.The diversity index was calculated as 1-Sum_i (piA2), where pi is the relative abundance of lineage i.The diversity index is at its maximum when all barcode groups are equally abundant and decreases if some barcode groups are enriched and others depleted.The index was scaled by the max possible index given the number of barcode groups which is max (Lineage diversity index) = 1 -n[(1/n)^2] = 1 -1/ n; n: number of barcode groups.Source code for scRNA sequencing and genetic diversity assessment is available here: https://github.com/johnnyUCSF/FH_TB Whole exome sequencing: DNA was extracted from snap-frozen cell pellets using the DNeasy Blood & Tissue kit (Qiagen).DNA quality was assessed by automated electrophoresis using the High Sensitivity DNA Kit and an Agilent 2100 BioAnalyzer (Agilent Technologies, Inc.).DNA was quantified using the Qubit dsDNA HS Assay kit and a Qubit 2.0 fluorometer (Thermo Fisher Scientific).Library preparation and paired-end 150bp (PE150, Illumina) DNA sequencing were performed by Novogene (Novogene Corporation, Sacramento, USA).Pair-end fastq files were mapped to the hg19 genome and mutation calling using the SeqMule pipeline64.The VCF files were annotated using ANNOVAR software at a high-performance computing cluster (UCSF Helen Diller Comprehensive Cancer Center).Further analysis of annotated variants was conducted under the RStudio/R environment.

LINCS L1000 Concordance score
The NIH LINCS L1000 database contains gene expression data from cultured human cells treated with small molecule and genetic perturbagens.Level 4 data was sourced from the Gene Expression Omnibus Series GSE70138.Expression data was restricted to small molecule perturbagens and intersected with the residual disease signature (N = 83 genes).Using a previously published computational pipeline69,70, a score for each signature-drug pair was determined using a non-parametric rank-based method that is similar to the Kolmogorov-Smirnov test statistic, where negative scores indicate genes in the ranked drug profile are oppositely regulated in the ranked disease signature.P-values for drug-gene expression profiles were determined by comparing their scores to a distribution of random scores and adjusted with the false discovery rate (FDR; Benjamini-Hochberg, = 0.05) method.Metadata and identifiers associated with each perturbagen were sourced from the iLINCS suite.For the upregulated residual disease signature, drug-gene expression profiles were chosen that produce the greatest significant negative score.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers.We strongly encourage code deposition in a community repository (e.g.GitHub).See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement.This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy This data is available as an NCBI Bioproject under accession number PRJNA766057.For single cell RNA seq analyses of patient specimens, the data is derived from a previously published study and available as an NCBI Bioproject under accession number PRJNA591860.Plasmids, data and codes generated are available by request from the corresponding author.
Research involving human participants, their data, or biological material Policy information about studies with human participants or human data.See also policy information about sex, gender (identity/presentation), and sexual orientation and race, ethnicity and racism.

Reporting on sex and gender
Participant sex (biologica-l attribute) provide-d in supplementary tables.No sex-and gender-based analysis was performed, not relevant and outside of scope of current study.
Reporting on race, ethnicity, or Participant race provided in supplementary tables.No race-based analysis was performed, not relevant and outside of scope other socially relevant of current study.
Other population characteristics provided but not relevant in this study: -Smoking historyRecruitmentThe patient data is derived from a previously published study and available as an NCBI Bioproject under accession number PRJNA591860.